Longitudinal study of ASR performance on ageing voices
نویسندگان
چکیده
This paper presents the results of a longitudinal study of ASR performance on ageing voices. Experiments were conducted on the audio recordings of the proceedings of the Supreme Court Of The United States (SCOTUS). Results show that the Automatic Speech Recognition (ASR) Word Error Rates (WERs) for elderly voices are significantly higher than those of adult voices. The word error rate increases gradually as the age of the elderly speakers increase. Use of maximum likelihood linear regression (MLLR) based speaker adaptation on ageing voices improves the WER though the performance is still considerably lower compared to adult voices. Speaker adaptation however reduces the increase in WER with age during old age.
منابع مشابه
Ageing Voices: The Effect of Changes in Voice Parameters on ASR Performance
With ageing, human voices undergo several changes which are typically characterized by increased hoarseness and changes in articulation patterns. In this study, we have examined the effect on Automatic Speech Recognition (ASR) and found that the Word Error Rates (WER) on older voices is about 9% absolute higher compared to those of adult voices. Subsequently, we compared several voice source pa...
متن کاملImpact of Age in ASR for the Elderly: Preliminary Experiments in European Portuguese
Standard automatic speech recognition (ASR) systems use acoustic models typically trained with speech of young adult speakers. Ageing is known to alter speech production in ways that require ASR systems to be adapted, in particular at the level of acoustic modeling. This paper reports ASR experiments that illustrate the impact of speaker age on speech recognition performance. A large read speec...
متن کاملTemplate-based ASR using posterior features and synthetic references: comparing different TTS systems
In recent works, the use of phone class-conditional posterior probabilities (posterior features) directly as features has provided successful results in template-based ASR systems. In this paper, motivated by the high quality of current text-to-speech systems and the robustness of posterior features toward undesired variability, we investigate the use of synthetic speech to generate reference t...
متن کاملSynthetic References for Template-based ASR using posterior features
Recently, the use of phoneme class-conditional probabilities as features (posterior features) for template-based ASR has been proposed. These features have been found to generalize well to unseen data and yield better systems than standard spectralbased features. In this paper, motivated by the high quality of current text-to-speech systems and the robustness of posterior features toward undesi...
متن کاملDevelopment of New Telephone Speech Databases for French: the NEOLOGOS Project
The NEOLOGOS project is a speech databases creation project for the French language, resulting from a collaboration between French universities and industrial companies, and supported by the French Ministry for Research. The goal of NEOLOGOS is to create new kinds of speech databases: firstly, a 1000 speakers telephone database of children’s voices, called PAIDIALOGOS, following the SpeechDat g...
متن کامل